bschiffthaler
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # Day 1 Doc ## Schedule | Time (CET) | Program | |------------|---------| | 14:00 - 14:55 | Setup & Introductions | | 15:00 - 15:40 | Getting Started with Nextflow | | 15:45 - 16:20 | Nextflow Scripting | | 16:25 - 16:50 | Nextflow parametrization | | 16:55 - 17:35 | Channels | | | | | 17:35 - 18:30 | **Break** | | | | | 18:30 - 19:10 | Processes | | 19:15 - 20:00 | Processes II | | Extra? | Workflows | ## Session 0 - Setup This is a living document that we will use throughout the course with the aim to document as much as possible so it becomes a useful dry lab work book. ### Poll 1 1. I have installed the nextflow extension: ***** ***** ***** 3. I will use the web version: *** ### Poll 2 1. I am connected: Anna Tom Tracy Matthew Mikhail | * | Ioana Junmo Francesco Juan |Panos |Brittany Hira 2. I am having connection issues: Alex 3. Other (please specify): 4. Could not connect via desktop app (Panos) ### Connection Use the command palette (Menu`> View > Command Palette` and start typing: `>Remote-SSH: Connect to Host...` ). Then select franklin as the host (or franklin.upsc.se). Alternatively, use the green button at the bottom left corner. The initial set up is to `Add New SSH Host` and type in `ssh student@franklin.upsc.se:PORT`, where you port is listed in the table below. You will be prompted for the Operating System. Choose: `linux`. A new window will open, prompting for the password. Use the password (and keep it secret!) we shared. | Name | Port | -------|------| |Mikhail Osipovitch|9000| |Kun Li|9001| |Alexander Vergara|9002| |Tom Jenkins|9003| | Hira Naveed|9004| |Nick Pestell|9005| |Brittany Roberts|9006| |Matthew Adair|9007| |Panagiotis Provataris|9008| |Ioana Onut Brännström|9009| |Junmo Sung|9010| |Juan Gaitan|9011| |Francesco Gualdrini|9012| |Anna Sommer|9013| |Tracy Chew|9014| |Mohsen Hegab|9015| ``` student@franklin.upsc.se:PORT example: student@franklin.upsc.se:9019 ``` ### :question: Questions! * Is there any extension within VScode for vpn connection? Or if we need a vpn agent, we do it outside VScode (some institutions need when working outside)? * The VPN is independent from VS code, if you have e.g. Cisco connected, traffic should get routed through that and you should be fine. Specifics should be discussed with your corporate IT. * Is there an extension within VScode for X11 Forwarding? And keen to hear any about other cool extensions that you recommend :smile: * In general, X11 and VS Code do not play with each other, but there are data science plugins e.g. for Jupyter, R, etc. that render plots inside VS Code, so there also isn't really a need for it * I see nextflow is already pre-installed in the franklin space. Could you show best practice to install in HPC, AWS or others? (Containers, conda envs etc.. ). Which way to install nextflow is the best? * `conda` works well, most of the time, your local IT support (on HPCs - High Perf. Compute) will have it pre-installed. * `curl -s https://get.nextflow.io | bash` * Needs: * Java 11+ * curl * bash * What is the best practice for order of code? I.e. I've seen colleagues do params > process(es) > workflow. You have presented params > workflow > process(es). * I don't think that there are real guidelines for that (Nico), but I might have missed Good Coding Practice for NF. When workflow get really big, you can actually split them in multiple files. My opinion is as long as you have a clear structure and it's shared with your colleagues, it should be fine. * I see under input: path read - how did it know to check input_ch? * In the workflow, you call the process `NUM_LINES(input_ch)`. In the process you assign the input as the variable `read` (name is arbitrary, you choose), which is of type `path` (_i.e._ it expects a filename) * I see! What if you have multiple inputs per process? * Then you will provide multiple inputs :smile: I'm sure we will hear about that :wink: * If you want now to do something else with the reads how would you continue the workflow? Can you highlight the blocks that execute a feature one more time. Thank you. * We will talk more about this, but basically you would either extend the `script` in the current `process`. Or you would create a new `process` that takes as input the output of `NUM_LINES`. How to chain the processes is what you define in the `workflow` section. Whether you extend the `script` or create a new `process` depends on several factors, but usuallym you'd like processes to do atomic` operations; _i.e._ a very clearly define single (if not simple) operation (like compressing a file). The advantage is that such a process can easily be re-used. It's however a balancing act between being atomic and being time efficient when writing the pipeline. (my opinion, Nico) * The #!… will always be the same even when coding and using a condo env? * Yes, unless you are on a very exotic Operating System, but I have not seen any in a very long time, so let's go for 99% :wink: * How one can add comments near a line? Like the use of # in bash. * Like in Java: use `//` - see what Bastian is doing in the code. You can have multi-line comments using: ```{nextflow} /* some longer comments */ ``` * How do you connect the parameter file with the nextflow script. I missed it in the explanation. * See the example in section 3 [below](https://hackmd.io/HkpBvb1eQ82vGPLVKwMDqg?both#Use-a-param-file) :smile: * I know there are usually several config files in additions to the params. Could you suggest best practice to organize and keep tidy an NF code? * As you just heard, that's also more or less a personal preference. Bastian recommends having nexflow settings in `nextflow.config` and params in a separate file. If you have a large pipeline and a lot of config, you can use the `includeConfig` statement in the `nextflow.config` to split your config in multiple files. * As mentioned can the params separate file be written in groovy and sourced from within the nf workflow? Can should :smile: * You can, but you probably shouldn't (if you still want, you can look at the `import` keyword in Groovy) * How would you recommend architecting your scripts if for example, you had 100 samples to align, 2 fastq's each? i.e. run nextflow command 100 times, or write code to create params.json and run nextflow once? * Much easier than this! There is a channel to identify Paired-end (PE) data. Then for the 100 samples, you just use globbing as we just did (_i.e._ `run nexflow --input "*.fq.gz"`) and you let nextflow do the magic. `input` being your `input_ch`, nexflow is going to iterate per PE set. If in addition, you configure `nextflow` to submit to your queueing system (_e.g._ SLURM, PBS, whatever), it will handle all the job submission. Neat! * So it will automatically pattern match .fq.gz files to identify pairs? That is pretty cool :) * I didn't really get it what is the use of channels? * Channels is how nextflow communicates between processes: https://www.nextflow.io/docs/latest/channel.html. channel (data input) -> process -> channel (process output) -> possible channel modification (next data input) -> next process -> next channel -> next process -> _etc_... * So in a real example, would you say use sample IDs in a queue channel to process each sample 1 by 1, then use value channel to keep calling reference genome channel? * Yup. If you look at https://www.nextflow.io/docs/latest/channel.html. You will see there are queue channels to get sequencing files as paired-end or either from the SRA. Reusing a queue channel is useful if you want a fork in your workflow. e.g. running fastqc on all raw data (one process) while running sortmerna on the raw data (second process) * difference between value and queue channels? * Nextflow distinguish two different kinds of channels: queue channels and value channels. A queue channel is a non-blocking unidirectional FIFO queue which connects two processes or operators. A value channel a.k.a. singleton channel by definition is bound to a single value and it can be read unlimited times without consuming its content. So a queue is something you take values from and the queue eventually gets exhausted , once all values it stores have been emitted. A value channel is infinitely re-usable. * I saw in nf-core that not all params are converted to actual value Channels. Is this normal practice? in several codes they simply assign params values to other variables i.e. ch_fast = params.faster * Yes, for example if you have a param which is your genome version, you do not need it to be in a channel, a param is enough. It's again a more or less personal choice. You could also create a `Channel.value(params.genome)` :smirk: * Is it advisable to tight processes with its own environment (conda or docker) to track down tools versioning? * Yes. That ensure reproducibility. As an example of a process: ```{nextflow} process FASTQC_SE { // Run fastqc per file tag "fastqc ${file}" publishDir "results/$params.dataset/fastqc/", mode: "copy" container 'docker://bschiffthaler/fastqc' input: path(reads) output: path("${reads.simpleName}_fastqc.html") script: """ fastqc ${reads} """ } ``` I have singularity enabled (more about that tomorrow), so singularity is going to pull the docker://bschiffthaler/fastqc image. If you use a tag, _e.g._ docker://bschiffthaler/fastqc:0.11.9' you would force a specific version. Of course container could be a config/param and you could have all versions in a separate file. * So with conda automation of download is not duable like with docker? * It also is, you just need to tell conda what version to fetch. * how to turn Queue into list of values channels? Like when all the samples have been aligned then I want to collect all of them into a unique channel. * It's called `collect` and we will see it later :slightly_smiling_face: - see [collect](https://www.nextflow.io/docs/latest/operator.html?highlight=collect#collect) in the doc. * Can the naming in the work directory be controlled? like to output meaninfull names? * Sure. You can control the output name: check the code bit above where I output a filename depending on the input. Input is `reads`, output is `"${reads.simpleName}_fastqc.html"` It is actually quite essential, as often tools (such as fastqc) have "hardcoded" output file name, so you will want to capture them accordingly. There is more to that in the sense that you can choose with output will be kept and which will be considered temporary. Usually you do not want all the files created during the pre-processing to be saved. In my example above as you can see, I actually copy the fastqc results in my "publish" directory, where the final results will be found. Instead of copying, I could move it - saving space (tbh, the block above is some work on progress, so I when prototyping like to copy and be fairly verbose. At a later stage, I will clean it up). * What is the advantage of using the value channels as opposed to use directly the params.kmer for example as the params.kmer will also be reusable? * It is more flexible. You can have `params.samples=*.fq.gz` and use that, or you can use a channel like: ```{nextflow} ch_input = Channel.fromFilePairs("$params.dataset" + '/*_{1,2}.f*q.gz', checkIfExists:true) ``` that will allow you to handle paired-end data without any extra effort of handling that by yourself. Even if that channel is consumable, you can use it in multiple processes: ```{nextflow} workflow { FASTQC(ch_input) SORTMERNA(ch_input) } ``` So the same channel is used for fastqc while forked to be run through SortMeRNA (rRNA identification / removal). ### :exclamation: Comments: * Reentrant: the same as to "resume". The pipeline can restart to the latest completed stage, skipping all the previous steps. It is not the default in Nextflow, you need to use the `--resume` flag on the command line. * DSL - A domain-specific language (DSL) is a [computer language](https://en.wikipedia.org/wiki/Computer_language) specialized to a particular application [domain](https://en.wikipedia.org/wiki/Domain_(software_engineering)). The second iteration of the Nextflow DSL (a.k.a. DSL2) simplified a lot the use of Nextflow, see this blog [post](http://www.ens-lyon.fr/LBMC/intranet/services-communs/pole-bioinformatique/bioinfoclub_list/nextflow-dsl2-laurent-modolo). ## Links [Nextflow documentation](https://www.nextflow.io/docs/latest/) [Course from SciLifeLab, the Swedish National Sequencing / Bioinformatics facility](https://uppsala.instructure.com/courses/58267/pages/nextflow-1-introduction?module_item_id=387489) [Some extra content from the same course](https://uppsala.instructure.com/courses/58267/pages/nextflow-7-extra-material?module_item_id=468797) [Another good "book" reference](https://bioinformaticsworkbook.org/dataAnalysis/nextflow/02_creatingAworkflow.html#gsc.tab=0) [Some good tips from the Andersen lab (Denmark)](https://andersenlab.org/dry-guide/2021-12-01/writing-nextflow/) [Existing workflows - nf-core](https://nf-co.re/) - you can pull them and run them. There are also on the webpage a possibility to graphically configure them ands export the corresponding config file. ## Session 1 - Getting Started with Nextflow ```{bash} cd nf-training/ cp scripts/introduction/wc.nf . nextflow run wc.nf ``` ## Session 2 - Nextflow scripting ```{nextflow} #!/usr/bin/env nextflow // printing some lines println("Hello, world!") println "Hello city" // Types my_var = 1 my_f = 3.14159265 my_bool = false my_s = "chr1" my_pattern = /\d+/ text = """ this is a multi line string """ println "Current ${my_var}_chromosome" // Compounds kmers = [1,2,4] kmers[0] kmers[-1] println kmers[0..1] println "My kners: ${kmers[0..1]}" println "The list of kmers is ${kmers.size()} elements long" /* some longer comments */ // Maps roi = [chr: "chr10", start: 10000, end: 12000, genes: ["ATP1B2", "TP53"]] println roi["chr"] println roi.chr println roi.get("chr") // Closures square = { it * it } // "it" is the default variable name // you can redefine the variable used in closures as follows: // square = { variable -> variable * variable} x = [1,2,3,4] y = x.collect(square) println x println y ``` ## Session 3 - Workflow parameterization ```{bash} nextflow run wc.nf --input "data/yeast/reads/ref2*.fq.gz" ``` ### Exercise: Re-run the Nextflow script wc.nf by changing the pipeline input to all files in the directory data/yeast/reads/ that begin with ref and end with .fq.gz Put a star next to this comment: ***** ***** **** ```{bash} student@f638cd152d21:~/nf-training$ nextflow run wc.nf --input "data/yeast/reads/ref*.fq.gz" N E X T F L O W ~ version 22.04.5 Launching `wc.nf` [marvelous_ritchie] DSL2 - revision: c6e739fcd6 executor > local (6) [3d/724e39] process > NUM_LINES (6) [100%] 6 of 6 ✔ ref2_2.fq.gz 81720 ref3_2.fq.gz 52592 ref2_1.fq.gz 81720 ref1_1.fq.gz 58708 ref1_2.fq.gz 58708 ref3_1.fq.gz 52592 ``` ### Exercise 2: 1. Command-line: 11111111111 2. External file: 2222222222 3. In the pipeline: 33333333 ____ Answer: 1. Parameters specified on the command line (--something value) 2. Parameters provided using the -params-file option 3. Config file specified using the -c my_config option 4. The config file named nextflow.config in the current directory 5. The config file named nextflow.config in the workflow project directory 6. The config file $HOME/.nextflow/config 7. Values defined within the pipeline script itself (e.g. main.nf) ### Use a param file: The `params.json` file ```{json} { "input": "data/yeast/reads/ref1_1.fq.gz", "sleep": 2 } ``` The script call: ```{bash} nextflow run wc.nf -params-file params.json ``` ## Session 4 - Channels ```{nextflow} #!/sur/bin/env nextflow nextflow.enable.dsl=2 ch1 = Channel.value("GRCh38") ch2 = Channel.value(["chr1","chr2","chrX"]) ch3 = Channel.value(["chr": "chr1", "start": 10000, "stop":12000]) ch2.view() chr_channel = Channel.of("chr1","chr2","chrX") chr_channel.view() ``` If you use the Factory: `Channel.fromSRA()` you will need an API key to access the data, see: https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/ Channels docs: https://www.nextflow.io/docs/latest/channel.html#channel-factory ## Session 5 - Processes In a file called `process.nf` ```{nextflow} #!/usr/bin/env nextflow nextflow.enable.dsl=2 // salmon index -t <fasta_transcriptome> -i <out_dir> --kmerLen 29 process INDEX { script: """ salmon index -t ${projectDir}/data/yeast/transcriptome/Saccharomyces_cerevisiae.R64-1-1.cdna.all.fa.gz \ -i ${projectDir}/data/yeast/salmon_index --kmerLen 29 """ } workflow { INDEX() } ``` ```{bash} nextflow run process.nf ``` You can use conditionals in processes, _e.g._ ```{nexflow} script: """ if ( ${params.aligner} == "salmon" ) { salmon index -t ${projectDir}/data/yeast/transcriptome/Saccharomyces_cerevisiae.R64-1-1.cdna.all.fa.gz \ -i ${projectDir}/data/yeast/salmon_index --kmerLen 29 } else if (...) { ... } """ ``` ## Session 6 - Processes Part 2 ## Session 7 - Workflow ## Session 8 - Operators ## Session 9 - Nextflow configuration

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully