---
title: "Portability (making fewer assumptions)"
teaching: 0
exercises: 0
questions:
- "Key question (FIXME)"
objectives:
- "transform hardcoded reference files into inputs"
- "recognize (and remove) housekeeping steps in the shell script that are not needed in CWL"
- "recognize and remove details and configuration in a script that are specific to particular infrastructure"
keypoints:
- "First key point. Brief Answer to questions. (FIXME)"
---
By the end of this episode,
learners should be able to
__improve the portability of their shell (and other language) scripts__
Prequisites:
- Adding your own script to a step
- Capturing output
# Parking lot
- explain the difference between _control flow_ and _data flow_
- What is the definition of Portability? or what do we mean by it?
- Should we consider how to implement cleaning up if our bash script fails half way?
- Do we need to handle non-failure non-zero exit codes here?
### Exercise:
A script in you workflow prepares an index for file for later use:
#### Inputs
```
/cwl_project
├── bin
│ └── index_genome.sh
└── data
└── custom_genome.fa
```
#### Executed as
```shell=
bin/index_genome.sh
```
where `index_genome.sh` contains:
```bash=
#!/usr/bin/env bash
# We have a custom genome reference (in FASTA format)
# that needs to be indexed to use with the aligner "bwa"
bwa index data/custom_genome.fa
```
The location of `custom_genome.fa` is specific to our setup which can lead to problems later on.
In order to avoid this we want to extract the path and provide it as an argument to our `index_genome.sh` script.
### Solution:
```bash=
#!/usr/bin/env bash
# We have a custom genome reference (in FASTA format)
# that needs to be indexed to use with the aligner "bwa"
bwa index $1
```
```shell=
bin/index_genome.sh data/custom_genome.fa
```
Exercise: Which of these commands will work
Options # TODO: a bigger example?
bwa index /data/reference
bwa index data/reference
bwa index $reference_genome
bwa index $1
Answer
1. No, hardcoded path
2. Could work with InitialWorkDirRequirement, but should be avoided
3. Will work if we have established the variable
4. Will work if we have established the variable
Exercise: Which command will clean up the environment
Options
1. rm -f data/reference.fa.*
2. rm -f $1
3. No need for that
-----
title: "Turning a shell script into a workflow"
teaching: 0
exercises: 0
questions:
- "Key question (FIXME)"
objectives:
- "identify tasks, and data links in a script"
- "recognize loops that can be converted into scatters"
- "finding and reusing existing CWL command line tool descriptions "
keypoints:
- "First key point. Brief Answer to questions. (FIXME)"
---
By the end of this episode,
learners should be able to
__convert a shell script into a CWL workflow__
Prequisites:
- Portability
- Adding your own script to a step
- Capturing output
- Scattering
- Conditional (may be moved to intermediate curriculum)
```bash=
#!/usr/bin/env bash
if ! [ -f data/reference.fa.sa ]; then
bwa index data/reference.fa
fi
for sample in samples; do
bwa mem data/reference.fa $sample > outputs/$sample.sam
done
# we have limited storage space so we
# cleanup indices once we are done
rm -f data/reference.fa.*
# TODO: more commands, to demonstrate data linkages
```
Exercise: Find existing CWL tool descripts for all the tools in the above script
https://github.com/common-workflow-library/bio-cwl-tools
Answers
* https://github.com/common-workflow-library/bio-cwl-tools/blob/release/bwa/BWA-Index.cwl
* https://github.com/common-workflow-library/bio-cwl-tools/blob/release/bwa/BWA-Mem.cwl