wvdt

@wvdt

Joined on May 16, 2023

  • This tutorial assumes you have installed docker correctly! See https://hackmd.io/@wvdt/docker on how to do that. Let's look at the results for Run1, barcode08. This is a positive control sample. DIR=/home/$USER/workshop_data/groupA cd $DIR # look at the quality of the run with pycoQC conda activate pycoQC
     Like 1 Bookmark
  • questions: With reassortment of gene segments being a common event in avian influenza virus (AIV) evolution, does it make sense to use a reference-based mapping approach for constructing consensus genome sequences for AIV samples? Is it possible to reuse existing tools and workflows developed for the analysis of sequencing data from other viruses? objectives: Determine how reassortment impacts reference-based mapping approaches Use a collection of per-segment reference sequences to construct a hybrid reference genome that is sufficiently close to a sequenced sample to be useful as a reference for mapping Construct a sample consensus genome from mapped reads
     Like  Bookmark
  • This tutorial follows after the `Avian Influenza, Hybrid reference mapping' tutorial. For instructions on how to acquire and process the data, please refer to that tutorial. Data We are going to work with: the paired-end sanger reads in ~/workshop_data/avian_influenza Quality Control We already did this! Yay.
     Like  Bookmark
  • Docker is a tool that allows developers to easily deploy their applications in a sandbox (called containers) to run on the host operating system i.e. Linux. The key benefit of Docker is that it allows users to package an application with all of its dependencies into a standardized unit for software development. Terminology In the last section, we used a lot of Docker-specific jargon which might be confusing to some. So before we go further, let me clarify some terminology that is used frequently in the Docker ecosystem. Images - The blueprints of our application which form the basis of containers. In the demo above, we used the docker pull command to download the busybox image. Containers - Created from Docker images and run the actual application. We create a container using docker run which we did using the busybox image that we downloaded. A list of running containers can be seen using the docker ps command. Docker Daemon - The background service running on the host that manages building, running and distributing Docker containers. The daemon is the process that runs in the operating system which clients talk to. Docker Client - The command line tool that allows the user to interact with the daemon. More generally, there can be other forms of clients too - such as Kitematic which provide a GUI to the users. Docker Hub - A registry of Docker images. You can think of the registry as a directory of all available Docker images. If required, one can host their own Docker registries and can use them for pulling images.
     Like  Bookmark
  • Who is it for? If you work in a scenario where you need to process hundred’s of DNA/RNA sequenced samples every week then nextflow is for you! Many bioinformatics tasks or processes that you run, for example running fastqc are sequential by default and will eat away a lot of compute time as you wait to run the next step in your workflow. e.g. to run fastqc on all of your fastq files, you may use a for loop like below to get the results. for i in *.fastq.gz ; do fastqc $i ; done Note that this loop is sequential, meaning it will only process one file at a time and if fastqc takes 10 minutes to process one fastq file, it can add up very quickly as you process hundreds of samples altogether. Why use nextflow? Many bioinformatics pipelines comprise of various tools that one uses in a dataflow programming manner to get the final results. Some of these tools may be accessible with the BASH interpreter, some are written in python and will require the use of a python interpreter and some are written in R and will require R console for data analysis. Often times many bioinformaticians or data analysts have custom scripts that they also run on their samples in order to get desired results.
     Like  Bookmark
  • Questions: How to perform quality control of raw data? What are the quality parameters to check for a dataset? How to improve the quality of a dataset? Objectives: Assess short reads FASTQ quality using FastQC Assess long reads FASTQ quality using Nanoplot
     Like  Bookmark
  • What exactly is Conda? A package and environment managerLike apt/yum, but much more flexible Environments are isolated from each other User-contributed package recipes Different “channels”, can create your own Updated constantly Prebuilt binaries
     Like  Bookmark
  • Small Bash cheat sheet # change directory cd /scratch/$USER # show content of current directory ls # show current path you are located pwd # make a new directory called 'myfolder' mkdir myfolder # show full content of a file
     Like  Bookmark