Galaxy / 自訂分析工具
===
###### tags: `生物資訊`
###### tags: `生物資訊`, `生物資訊計算平台`, `Galaxy`, `基因體`
<br>
目錄:
[TOC]
<br>
:::info
:bulb: **重要文件**
- **[Adding custom tools to Galaxy](https://galaxyproject.org/admin/tools/add-tool-tutorial/)**
- **[Galaxy Tool XML File](https://docs.galaxyproject.org/en/latest/dev/schema.html)**
:::
<br>
## 新增自訂工具
### Step1 - 撰寫並測試自訂的工具
- 新增一個 toolExample.pl
```perl=
#!/usr/bin/perl -w
# usage : perl toolExample.pl <FASTA file> <output file>
open (IN, "<$ARGV[0]");
open (OUT, ">$ARGV[1]");
while (<IN>) {
chop;
if (m/^>/) {
s/^>//;
if ($. > 1) {
print OUT sprintf("%.3f", $gc/$length) . "\n";
}
$gc = 0;
$length = 0;
} else {
++$gc while m/[gc]/ig;
$length += length $_;
}
}
print OUT sprintf("%.3f", $gc/$length) . "\n";
close( IN );
close( OUT );
```
- 下載武漢肺炎病毒序列,當作測試例子
[Libraries](https://usegalaxy.org/library/list#) / [2019_nCoV](https://usegalaxy.org/library/list#/folders/Fe878daae442969ff) / [Assembled genomes](https://usegalaxy.org/library/list#folders/Fbd0d83390997b5b1) / nCoV_Jan31.fa
- 在本機端進行測試
```bash
perl toolExample.pl nCoV_Jan31.fa out.txt
```
- 執行結果
```
0.380
0.380
0.380
0.380
0.380
0.380
0.380
0.380
0.380
0.380
0.380
0.380
0.380
0.380
0.380
0.380
0.380
```
### Step2 - 上傳自訂的工具到 Galaxy
- 將 toolExample.pl 放到 ```${Galaxy安裝目錄}/tools/``` 目錄下
如 ```galaxy/tools/__tj_tools/toolExample/toolExample.pl```
- 定義自定義工具的 xml 檔
toolExample.xml
```xml
<tool id="fa_gc_content_1" name="Compute GC content" version="0.1.0">
<description>for each sequence in a file</description>
<command interpreter="perl">toolExample.pl $input $output</command>
<inputs>
<param format="fasta" name="input" type="data" label="Source file"/>
</inputs>
<outputs>
<data format="tabular" name="output" />
</outputs>
<help>
This tool computes GC content from a FASTA file.
</help>
</tool>
```
- XML 呈現的結果
[](https://i.imgur.com/V0pV7gO.png)
- 定義啟動工具的描述(description)、指令用法(command)、輸入參數(inputs)、輸出參數(outputs)
<br>
- 在 Galaxy 的工具組態,登錄該工具的 XML 檔
galaxy/config/tool_conf.xml
```xml
<toolbox>
...
<section name="TJTools" id="tjTools">
<tool file="__tj_tools/toolExample/toolExample.xml" />
</section>
</toolbox>
```
- `name` 是用來定義 Tools 清單的分類名稱
<br>
### Step3 - 重啟 Galaxy
- 重啟後,就可以在 Tools 清單看見自定義工具

<br>
- ```TJTools``` 則是定義在 ```<section>``` 中的 name 屬性
- ```Compute GC content``` 則是定義在 ```<tool>``` 中的 name 屬性
- ```for each sequence in a file``` 則是定義在 ```<description>``` 中
<br>
### Step4 - 點選自訂工具,進行測試
1. 輸入武漢肺炎病毒序列的 fa 檔,並點選執行

2. 輸出結果

3. 檢視輸出檔案的內容

<br>
## GATK workflow
- 資料來源參考
- [Genome / NTUH Project](https://hackmd.io/XtsPHvS1RC25IlS6K2AcNA)
- [[github] broadinstitute / GATK](https://github.com/broadinstitute/gatk)
- [[github] ohsu-comp-bio / compbio-galaxy-wrappers](https://github.com/ohsu-comp-bio/compbio-galaxy-wrappers/tree/master/gatk4)
- 工作流程
- bwa
- [指令用法來源](https://hackmd.io/XtsPHvS1RC25IlS6K2AcNA#runBWA)
```bash
# Usage: bwa mem [options] <idxbase> <in1.fq> [in2.fq]
# -M mark shorter split hits as secondary
# -R STR read group header line such as '@RG\tID:foo\tSM:bar' [null]
./bwa mem -M -R '@RG\tID:D15780_S13_L001\tSM:D15780_S13_L001\tPL:Illumina' -t 2 /Everythings/misc/bundle/b37/human_g1k_v37_decoy.fasta /Everythings/dataset/D15780_S13_L001_R2.fastq.gz | /Everythings/misc/samtools/samtools view -@ 2 -1 -o D15780_S13_L001.bam
```
- XML wrapper
```xml
<tool id="bwa_mem" name="Execute the command: 'bwa mem'" version="0.1.0">
<description>map medium and long reads (> 100 bp) against reference genome (Galaxy Version 0.7.17.1)</description>
<command>/Everythings/galaxy/tools/__tj_tools/misc/bwa/bwa mem -M -R '@RG\tID:D15780_S13_L001\tSM:D15780_S13_L001\tPL:Illumina' -t 2 /Everythings/misc/bundle/b37/human_g1k_v37_decoy.fasta $input -o $output 2>&1</command>
<inputs>
<param name="input" format="fastq" type="data" label="Source file of 'fastq'" />
</inputs>
<outputs>
<data name="output" format="sam" />
</outputs>
<help>
the wrapper of 'bwa mem' (path='__tj_tools/misc/bwa/bwa.xml')
</help>
</tool>
```
- ```$input``` & ```$output``` 表示變數資料
- [input 參數說明 (tool > inputs > param)](https://docs.galaxyproject.org/en/latest/dev/schema.html#tool-inputs-param)
以 ```type="data"``` 定義,從表單界面選取匹配的檔案。
實際內部運作,應該是從
``` /Everythings/galaxy/database/files/000/dataset_??.dat```
尋找匹配的檔案,列舉在表單中的清單
- output 檔,以 ```<data>``` 定義
輸出到 ```/Everythings/galaxy/database/files/000/dataset_??.dat```

- 錯誤處理
- 該 bwa 程式,即使在正常情況下,也會將正常訊息輸出到「標準錯誤輸出」,進而導致 Galaxy 在執行時,判定該程式有 error。
- 輸出到「標準錯誤輸出」的訊息
```
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 66530 sequences (20000278 bp)...
[M::process] read 66530 sequences (20000076 bp)...
[M::mem_process_seqs] Processed 66530 reads in 81.090 CPU sec, 40.345 real sec
[M::process] read 66530 sequences (20000048 bp)...
[M::mem_process_seqs] Processed 66530 reads in 73.791 CPU sec, 36.629 real sec
[M::process] read 47540 sequences (14291513 bp)...
[M::mem_process_seqs] Processed 66530 reads in 77.920 CPU sec, 38.771 real sec
[M::mem_process_seqs] Processed 47540 reads in 55.027 CPU sec, 27.465 real sec
[main] Version: 0.7.17-r1188
[main] CMD: /Everythings/galaxy/tools/__tj_tools/misc/bwa/bwa mem -M -R @RG\tID:D15780_S13_L001\tSM:D15780_S13_L001\tPL:Illumina -t 2 -o bwa_mem_output.bam /Everythings/misc/bundle/b37/human_g1k_v37_decoy.fasta /Everythings/dataset/D15780_S13_L001_R2.fastq.gz
[main] Real time: 148.109 sec; CPU: 292.716 sec
```
- 需要將 bwa 程式的「標準錯誤輸出」的訊息,導入到「標準輸出」
```
2&>1
```
- 也許有參數,可以抑制 debug 訊息?
- 修正前&修正後的執行結果

- [BWA-MEM 的複雜界面設計](#BWA-MEM-的複雜界面設計)
(請參考底下的章節)
<br>
## BWA-MEM 的複雜界面設計
- ### [指令用法來源](https://hackmd.io/XtsPHvS1RC25IlS6K2AcNA#runBWA)
```bash
# Usage: bwa mem [options] <idxbase> <in1.fq> [in2.fq]
# -M mark shorter split hits as secondary
# -R STR read group header line such as '@RG\tID:foo\tSM:bar' [null]
/Everythings/galaxy/tools/__tj_tools/misc/bwa/bwa mem -M -R '@RG\tID:D15780_S13_L001\tSM:D15780_S13_L001\tPL:Illumina' -t 2 /Everythings/misc/bundle/b37/human_g1k_v37_decoy.fasta $input -o $output 2>&1
```
- ### 預設參數選項 v.s. 自訂參數選項
- 表單
[](https://i.imgur.com/SoT8onR.png)
- 表單 / 選項
[](https://i.imgur.com/aErpzcn.png)
- 表單對應的 XML
```xml
<inputs>
...
<conditional name="params">
<param
name="source_select"
type="select"
label="BWA settings to use"
help="For most mapping needs use Commonly Used settings. If you want full control use Full Parameter List">
<option value="pre_set">Commonly Used</option>
<option value="full">Full Parameter List</option>
</param>
<when value="pre_set" />
<when value="full">
...
</when>
</conditional>
</inputs>
```
<br>
- ### [Hello World] 重導「標準輸出錯誤」到「標準輸出」的啟用選項
- #### 實際參數
```2>&1```
- #### 表單
- 表單 / 選項
[](https://i.imgur.com/xcwgrCP.png)
- 表單對應的 XML
```xml
<inputs>
...
<conditional name="params">
...
<when value="full">
<param name='hide_stderr' type="boolean" checked="false"
label="redirect stderr to stdout"
help="avoid program failure" />
</when>
</conditional>
</inputs>
```
- #### 指令
- 指令對應的 XML
```xml
<command>/Everythings/galaxy/tools/__tj_tools/misc/bwa/bwa mem
-M
-R '@RG\tID:D15780_S13_L001\tSM:D15780_S13_L001\tPL:Illumina'
-t 2 /Everythings/misc/bundle/b37/human_g1k_v37_decoy.fasta
$input_seq
-o $output
#if $params.source_select != "pre_set"
#if $params.hide_stderr
2>&1
#end if
#end if
</command>
```
- ```#if``` 和 ```#end if``` 是一個對稱的指令
若缺乏結尾,會產生執行錯誤
```
Some #directives are missing their corresponding #end ___ tag: if, if
```

- 變數特性
- $variable_name 表示變數
- 變數具有階層性
如 ```$params``` 底下的 ```hide_stderr``` 變數
以 ```$params.hide_stderr``` 表示
- 啟用&關閉的差別
Yes -> 34
No -> 33

<br>
- ### -M 參數
- #### 實際參數
```-M```
- #### 表單
- 表單 / 選項
[](https://i.imgur.com/WwayPPL.png)
- 表單對應的 XML
```xml
<inputs>
...
<conditional name="params">
...
<when value="full">
<param name="mark" type="boolean" checked="true"
label="Mark shorter split hits as secondary (-M)"
help="For Picard/GATK compatibility" />
</when>
</conditional>
</inputs>
```
- #### 指令
- 指令對應的 XML
```xml
<command>/Everythings/galaxy/tools/__tj_tools/misc/bwa/bwa mem
#if $params.source_select != "pre_set"
#if $params.mark
-M
#end if
#end if
-R '@RG\tID:D15780_S13_L001\tSM:D15780_S13_L001\tPL:Illumina'
-t 2 /Everythings/misc/bundle/b37/human_g1k_v37_decoy.fasta
$input_seq
-o $output
2>&1
</command>
```
- #### 其他參考資料
[bwa_mem.xml # mark](https://github.com/ohsu-comp-bio/compbio-galaxy-wrappers/blob/master/bwa/bwa_mem.xml#L164)
<br>
- ### -R 參數
- #### 實際參數
```-R '@RG\tID:D15780_S13_L001\tSM:D15780_S13_L001\tPL:Illumina'```
- #### 表單1: ID
- 表單 / 選項

- 表單對應的 XML
```xml
...
<when value="full">
...
<conditional name="readGroup">
<param
name="read_group"
type="select">
<option value="yes" selected="true">Yes</option>
<option value="no">No</option>
</param>
<when value="no"/>
<when value="yes">
<param name="read_group_id" type="text"
label="Read group identifier (ID). Each @RG line must have a unique ID. The value of ID is used in the RG tags of alignment records. Must be unique among all read groups in header section."
help="Required if RG specified. Read group IDs may be modified when merging SAM files in order to handle collisions.">
</param>
</when>
</conditional>
</when>
```
- #### 表單2: ID / 驗證器
- 表單 / 選項
[](https://i.imgur.com/JYCoUqP.png)
- 表單對應的 XML : ```<validator>```
```xml
...
<param name="read_group_id" type="text" ... >
<validator type="empty_field" />
</param>
```
- #### 表單3:完整
- 表單 / 選項
[](https://i.imgur.com/LYYraJm.png)
- 表單對應的 XML
```xml
<conditional name="read_group">
<param
name="read_group_enabled"
type="select"
label='Enabled Read Group(@RG) (-R)'>
<option value="yes" selected="true">Yes</option>
<option value="no">No</option>
</param>
<when value="no"/>
<when value="yes">
<param name="read_group_id" type="text"
value="D15780_S13_L001"
label="Read group identifier (ID). Each @RG line must have a unique ID. The value of ID is used in the RG tags of alignment records. Must be unique among all read groups in header section."
help="Required if RG specified. Read group IDs may be modified when merging SAM files in order to handle collisions.">
<validator type="empty_field" />
</param>
<param name="read_group_sm" type="text" from_dataset="input_seq"
value="D15780_S13_L001"
label="Sample (SM)."
help="Required if RG specified. Use pool name where a pool is being sequenced">
<validator type="empty_field" />
</param>
<param name="read_group_pl" type="select"
label="Platform/technology used to produce the reads (PL)" help="Optional">
<option value=""></option>
<option value="CAPILLARY">CAPILLARY</option>
<option value="LS454">LS454</option>
<option value="ILLUMINA" selected='true'>ILLUMINA</option>
<option value="SOLID">SOLID</option>
<option value="HELICOS">HELICOS</option>
<option value="IONTORRENT">IONTORRENT</option>
<option value="PACBIO">PACBIO</option>
</param>
</when>
</conditional>
```
- 執行結果
[](https://i.imgur.com/2BYqICv.png)
- #### bwa_mem.xml 完整版
```xml
<tool id="bwa_mem" name="Execute the command: 'bwa mem'" version="0.1.0">
<description>map medium and long reads (> 100 bp) against reference genome (Galaxy Version 0.7.17.1)</description>
<command>
## /Everythings/galaxy/tools/__tj_tools/misc/bwa/bwa mem
## -M -R '@RG\tID:D15780_S13_L001\tSM:D15780_S13_L001\tPL:Illumina'
## -t 2 /Everythings/misc/bundle/b37/human_g1k_v37_decoy.fasta
## /Everythings/dataset/D15780_S13_L001_R2.fastq.gz
## -o bwa_mem_output.bam
/Everythings/galaxy/tools/__tj_tools/misc/bwa/bwa mem
#if $params.source_select == "pre_set"
-M
#else
#if $params.mark
-M
#end if
#if $params.read_group.read_group_enabled == 'no'
## no param: -R STR read group header line such as '@RG\tID:foo\tSM:bar' [null]
#pass
#else
## -R '@RG\tID:D15780_S13_L001\tSM:D15780_S13_L001\tPL:Illumina'
#set $rg_id = $params.read_group.read_group_id
#set $rg_sm = $params.read_group.read_group_sm
#set $rg_pl = $params.read_group.read_group_pl
#if $rg_sm
#set $rg_sm = '\\tSM:%s' % $rg_sm
#end if
#if $rg_pl
#set $rg_pl = '\\tPL:%s' % $rg_pl
#end if
-R '@RG\tID:${rg_id}${rg_sm}${rg_pl}'
#end if
#end if
-t 2 /Everythings/misc/bundle/b37/human_g1k_v37_decoy.fasta
$input_seq
-o $output
#if $params.source_select == "pre_set"
2>&1
#else
#if $params.hide_stderr
2>&1
#end if
#end if
</command>
<inputs>
<param name="input_seq" format="fastq" type="data" label="Source file of 'fastq'" />
<conditional name="params">
<param
name="source_select"
type="select"
label="BWA settings to use"
help="For most mapping needs use Commonly Used settings. If you want full control use Full Parameter List">
<option value="pre_set">Commonly Used</option>
<option value="full">Full Parameter List</option>
</param>
<when value="pre_set" />
<when value="full">
<param name='hide_stderr' type="boolean" checked="true"
label="redirect stderr to stdout"
help="avoid program failure" />
<param name="mark" type="boolean" checked="true"
label="Mark shorter split hits as secondary (-M)"
help="For Picard/GATK compatibility" />
<conditional name="read_group">
<param
name="read_group_enabled"
type="select"
label='Enabled Read Group(@RG) (-R)'>
<option value="yes" selected="true">Yes</option>
<option value="no">No</option>
</param>
<when value="no"/>
<when value="yes">
<param name="read_group_id" type="text"
value="D15780_S13_L001"
label="Read group identifier (ID). Each @RG line must have a unique ID. The value of ID is used in the RG tags of alignment records. Must be unique among all read groups in header section."
help="Required if RG specified. Read group IDs may be modified when merging SAM files in order to handle collisions.">
<validator type="empty_field" />
</param>
<param name="read_group_sm" type="text"
value="D15780_S13_L001"
label="Sample (SM)."
help="Required if RG specified. Use pool name where a pool is being sequenced">
<validator type="empty_field" />
</param>
<param name="read_group_pl" type="select"
label="Platform/technology used to produce the reads (PL)" help="Optional">
<option value=""></option>
<option value="CAPILLARY">CAPILLARY</option>
<option value="LS454">LS454</option>
<option value="ILLUMINA" selected='true'>ILLUMINA</option>
<option value="SOLID">SOLID</option>
<option value="HELICOS">HELICOS</option>
<option value="IONTORRENT">IONTORRENT</option>
<option value="PACBIO">PACBIO</option>
</param>
</when>
</conditional>
</when>
</conditional>
</inputs>
<outputs>
<data name="output" format="sam" />
</outputs>
<help>
the wrapper of 'bwa mem' (path='__tj_tools/misc/bwa/bwa_mem.xml')
</help>
</tool>
```
- #### 其他參考資料
[bwa_mem.xml # readGroup](https://github.com/ohsu-comp-bio/compbio-galaxy-wrappers/blob/master/bwa/bwa_mem.xml#L165)
<br>
## BWA-MEM 的 wrapper
- ### 使用 python 來打包 BWA-MEM
- [BWA-MEM wrapper](https://github.com/ohsu-comp-bio/compbio-galaxy-wrappers/tree/master/bwa)
- [bwa_mem.xml](https://github.com/ohsu-comp-bio/compbio-galaxy-wrappers/blob/master/bwa/bwa_mem.xml)
- [bwa_mem.py](https://github.com/ohsu-comp-bio/compbio-galaxy-wrappers/blob/master/bwa/bwa_mem.py)
- 參考基因體的佈署
- UI 的選項

- [tool_data_table_config_path (官方說明)](https://docs.galaxyproject.org/en/master/admin/config.html#tool-data-table-config-path)
> XML config file that contains data table entries for the ToolDataTableManager. This file is manually # maintained by the Galaxy administrator (.sample used if default does not exist).
- 更多資訊
- [Data Preparation documentation](https://galaxyproject.org/admin/data-preparation/)
- 下載 Galaxy team 建置的參考基因體索引 (Galaxy Datacache)
- http://datacache.galaxyproject.org/
- http://datacache.galaxyproject.org/indexes/hg19/
- http://datacache.galaxyproject.org/indexes/hg19/hg19full/bwa_index/
- 新增 data-table 的入口,並定義表格的欄位
- config/tool_data_table_conf.xml (或是 config/tool_data_table_conf.xml.sample)檔案中,附加底下內容
```xml
<table name="bwa_mem_indexes" comment_char="#">
<columns>value, dbkey, name, path</columns>
<file path="/Everythings/galaxy/tools/__tj_tools/misc/bwa/tool-data/bwa_index.loc" />
</table>
```
- 在 bwa_index.loc 中,列舉所使用的參考基因體資訊,以 tab 隔開
- bwa_index.loc
```
human_g1k_v37 b37 human_g1k_v37_decoy /Everythings/misc/bundle/b37/human_g1k_v37_decoy.fasta
```
- 欄位1:<unique_build_id> 該參考基因體的ID
- 欄位2:<dbkey> 常用的參考基因體代碼,如 b37(hg19), b38(hg38)
- 欄位3:<display_name> 顯示在 UI 上的選項名稱
- 欄位4:<file_path> 參考基因體的實際位置
<br>
## 完整版的 bwa mem 配置(官方版)
- 檔案路徑galaxy/database/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/bwa/01ac0a5fedc3/bwa/
- bwa.xml
- bwa-mem.xml
- bwa_macros.xml
- read_group_macros
<br>
## SAM-to-BAM
- ### 安裝套件 sam_to_bam
> Convert SAM format to BAM format.
- ### 需要參考基因體
- 所需的 data_table 名稱是 fasta_indexes
```xml
<param name="index" type="select" label="Using reference genome">
<options from_data_table="fasta_indexes">
<filter column="dbkey" key="dbkey" ref="input1" type="data_meta" />
<validator message="No reference genome is available for the build associated with the selected input dataset" type="no_options" />
</options>
</param>
```
- 新增 data-table 的入口,並定義表格的欄位
- config/tool_data_table_conf.xml (或是 config/tool_data_table_conf.xml.sample)檔案中,附加底下內容
```xml
<table name="fasta_indexes" comment_char="#">
<columns>value, dbkey, name, path</columns>
<file path="tool-data/fasta_indexes.loc" />
</table>
```
- 在 fasta_indexes.loc 中,列舉所使用的參考基因體資訊,以 tab 隔開
- fasta_indexes.loc
```
human_g1k_v37 hg_g1k_v37 human_g1k_v37_decoy /Everythings/misc/bundle/b37/human_g1k_v37_decoy.fasta
```
- 欄位1:<unique_build_id> 該參考基因體的ID
- 欄位2:<dbkey> 常用的參考基因體代碼,如 b37(hg19), b38(hg38)
- 欄位3:<display_name> 顯示在 UI 上的選項名稱
- 欄位4:<file_path> 參考基因體的實際位置
- 注意事項
- 需將 dbkey,從 b37 改成 hg_g1k_v37
才會符合內建的清單,才能被 ```<filter>``` 過濾出來
- 或是先拿掉 ```<filter>``` 進行測試
## 指令測試備註
- ### [BWA-MEM](https://hackmd.io/XtsPHvS1RC25IlS6K2AcNA#runBWA)
```bash
./bwa mem -M -R '@RG\tID:D15780_S13_L001\tSM:D15780_S13_L001\tPL:Illumina' -t 2 /Everythings/misc/bundle/b37/human_g1k_v37_decoy.fasta /Everythings/dataset/D15780_S13_L001_R2.fastq.gz -o D15780_S13_L001.sam
```
- ### [SAM-to-BAM](https://hackmd.io/XtsPHvS1RC25IlS6K2AcNA#runBWA)
- 直接安裝內建的套件
- sam_to_bam.xml
```database/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/sam_to_bam/cf1ffd88f895/sam_to_bam/sam_to_bam.xml```
- ### [SortSamSpark](https://hackmd.io/XtsPHvS1RC25IlS6K2AcNA#runSortSam)
```bash
./gatk SortSamSpark --input D15780_S13_L001.bam --output D15780_S13_L001.sorted.bam --sort-order coordinate --java-options "-XX:+UseNUMA -Xmx16G" --tmp-dir . -- --spark-runner LOCAL --spark-master local[4] --conf spark.local.dir=./tmp
# 備註1:簡單版也可以跑,但不確定後面參數的用途
./gatk SortSamSpark --input D15780_S13_L001.bam --output D15780_S13_L001.sorted.bam --sort-order coordinate --java-options "-XX:+UseNUMA -Xmx16G" --tmp-dir .
# 備註2:
# - 把參數 spark-runner 搬到 -- 之前、或是搬到 -- 後面,看 log 一樣都有偵測到該參數
# - -- 測試起來,沒有實際作用,應該視為空參數,感覺只是給人類閱讀,單純用來區開參數
```
- 錯誤排除
- 工具所接收的 input 檔,必須是 sam 檔
- 但是 Galaxy 輸入/輸出檔案的副檔名,皆命名為 .dat
- 因此,工具會丟出例外
```
A USER ERROR has occurred: Failed to read bam header from /Everythings/galaxy/database/files/000/dataset_142.dat
Caused by:Cannot find format extension for /Everythings/galaxy/database/files/000/dataset_142.dat
```
- 暫時解法
```xml
<command detect_errors="exit_code"><![CDATA[
cp ${input} ${input}.bam; ## rename it to *.bam
@CMD_BEGIN@ SortSamSpark
##include source=$bam_req_opts#
-I ${input}.bam -O ${output}
--sort-order "${sort_order}"
## #include source=$bam_opt_opts#
--tmp-dir .
-- --spark-runner LOCAL --spark-master local[4] --conf spark.local.dir=./tmp
; rm -f '${input}.bam'
]]></command>
```
- ```cp ${input} ${input}.bam; ## rename it to *.bam```
<br>
<hr>
<br>
## Cheetah
- ### 簡介
- 免費開源的樣板引擎
- 也是一個程式碼生成工具
- 由 python2/3 驅動
- Python 官網:https://pypi.org/project/Cheetah3/
- Cheetah User’s Guide:https://cheetahtemplate.org/users_guide/index.html
- ### 安裝與執行
- #### 套件安裝
- 安裝指令
```pip install Cheetah3```
- python2
- [```sudo apt install python-pip```](https://blog.csdn.net/Mr_Cat123/article/details/79221012)
- python3 安裝失敗的處理方式
- 錯誤訊息
>ERROR: Could not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/usr/lib/python3.5/site-packages'
Consider using the `--user` option or check the permissions.
- 如何解決
```bash
sudo python3 -m pip install Cheetah3
```
- [[Errno 13] Permission denied How i solve this problem #236](https://github.com/googlesamples/assistant-sdk-python/issues/236)
- #### 範例程式1 ([Quickstart tutorial](https://cheetahtemplate.org/users_guide/gettingStarted.html#quickstart-tutorial))
```python
from Cheetah.Template import Template
templateDef = """
<HTML>
<HEAD><TITLE>$title</TITLE></HEAD>
<BODY>
$contents
## this is a single-line Cheetah comment and won't appear in the output
#* This is a multi-line comment and won't appear in the output
blah, blah, blah
*#
</BODY>
</HTML>"""
nameSpace = {'title': 'Hello World Example', 'contents': 'Hello World!'}
t = Template(templateDef, searchList=[nameSpace])
print(t)
```
執行結果:
```
<HTML>
<HEAD><TITLE>Hello World Example</TITLE></HEAD>
<BODY>
Hello World!
</BODY>
</HTML>
```
- #### 範例程式2
```python
from Cheetah.Template import Template
templateDef = """
#set $people = [
{'name' : 'Tom', 'mood' : 'Happy'},
{'name' : 'Dick', 'mood' : 'Sad'},
{'name' : 'Harry', 'mood' : 'Hairy'}]
<strong>How are you feeling?</strong>
<ul>
#for $person in $people
<li>
$person['name'] is $person['mood']
</li>
#end for
</ul>
"""
print(Template(templateDef))
```
執行結果:
```
<strong>How are you feeling?</strong>
<ul>
<li>
Tom is Happy
</li>
<li>
Dick is Sad
</li>
<li>
Harry is Hairy
</li>
</ul>
```
- #### 範例程式3 ([https://cheetahtemplate.org/](https://cheetahtemplate.org/))
```cheetah
#from Cheetah.Template import Template
#extends Template
#set $people = [{'name' : 'Tom', 'mood' : 'Happy'}, {'name' : 'Dick',
'mood' : 'Sad'}, {'name' : 'Harry', 'mood' : 'Hairy'}]
<strong>How are you feeling?</strong>
<ul>
#for $person in $people
<li>
$person['name'] is $person['mood']
</li>
#end for
</ul>
```
填入
```bash
$ cheetah fill test.py
Filling test.py -> test.py.html
```
開啟 test.py.html

<br>
## 參考資料
- [Installing Tools into Galaxy](https://galaxyproject.org/admin/tools/add-tool-from-toolshed-tutorial/)
- [Adding custom tools to Galaxy](https://galaxyproject.org/admin/tools/add-tool-tutorial/)
- [Galaxy Tool XML File](https://docs.galaxyproject.org/en/latest/dev/schema.html)
<br>
## [On-Going] Tab list
- ### ESC4000
- http://10.78.26.241:9696/
- ### Github
- [compbio-galaxy-wrappers/gatk4/gatk4_markduplicates.xml](https://github.com/ohsu-comp-bio/compbio-galaxy-wrappers/blob/master/gatk4/gatk4_markduplicates.xml)
- ### Galaxy
- [Galaxy Tool XML File](https://docs.galaxyproject.org/en/latest/dev/schema.html#tool-outputs-collection)
- [Creating a histogram tool tutorial.](https://galaxyproject.org/admin/tools/adding-tools/)