Portability (making fewer assumptions)

--- title: "Portability (making fewer assumptions)" teaching: 0 exercises: 0 questions: - "Key question (FIXME)" objectives: - "transform hardcoded reference files into inputs" - "recognize (and remove) housekeeping steps in the shell script that are not needed in CWL" - "recognize and remove details and configuration in a script that are specific to particular infrastructure" keypoints: - "First key point. Brief Answer to questions. (FIXME)" --- By the end of this episode, learners should be able to __improve the portability of their shell (and other language) scripts__ Prequisites: - Adding your own script to a step - Capturing output # Parking lot - explain the difference between _control flow_ and _data flow_ - What is the definition of Portability? or what do we mean by it? - Should we consider how to implement cleaning up if our bash script fails half way? - Do we need to handle non-failure non-zero exit codes here? ### Exercise: A script in you workflow prepares an index for file for later use: #### Inputs ``` /cwl_project ├── bin │ └── index_genome.sh └── data └── custom_genome.fa ``` #### Executed as ```shell= bin/index_genome.sh ``` where `index_genome.sh` contains: ```bash= #!/usr/bin/env bash # We have a custom genome reference (in FASTA format) # that needs to be indexed to use with the aligner "bwa" bwa index data/custom_genome.fa ``` The location of `custom_genome.fa` is specific to our setup which can lead to problems later on. In order to avoid this we want to extract the path and provide it as an argument to our `index_genome.sh` script. ### Solution: ```bash= #!/usr/bin/env bash # We have a custom genome reference (in FASTA format) # that needs to be indexed to use with the aligner "bwa" bwa index $1 ``` ```shell= bin/index_genome.sh data/custom_genome.fa ``` Exercise: Which of these commands will work Options # TODO: a bigger example? bwa index /data/reference bwa index data/reference bwa index $reference_genome bwa index $1 Answer 1. No, hardcoded path 2. Could work with InitialWorkDirRequirement, but should be avoided 3. Will work if we have established the variable 4. Will work if we have established the variable Exercise: Which command will clean up the environment Options 1. rm -f data/reference.fa.* 2. rm -f $1 3. No need for that ----- title: "Turning a shell script into a workflow" teaching: 0 exercises: 0 questions: - "Key question (FIXME)" objectives: - "identify tasks, and data links in a script" - "recognize loops that can be converted into scatters" - "finding and reusing existing CWL command line tool descriptions " keypoints: - "First key point. Brief Answer to questions. (FIXME)" --- By the end of this episode, learners should be able to __convert a shell script into a CWL workflow__ Prequisites: - Portability - Adding your own script to a step - Capturing output - Scattering - Conditional (may be moved to intermediate curriculum) ```bash= #!/usr/bin/env bash if ! [ -f data/reference.fa.sa ]; then bwa index data/reference.fa fi for sample in samples; do bwa mem data/reference.fa $sample > outputs/$sample.sam done # we have limited storage space so we # cleanup indices once we are done rm -f data/reference.fa.* # TODO: more commands, to demonstrate data linkages ``` Exercise: Find existing CWL tool descripts for all the tools in the above script https://github.com/common-workflow-library/bio-cwl-tools Answers * https://github.com/common-workflow-library/bio-cwl-tools/blob/release/bwa/BWA-Index.cwl * https://github.com/common-workflow-library/bio-cwl-tools/blob/release/bwa/BWA-Mem.cwl